Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality
نویسندگان
چکیده
rdf dataset quality assessment is currently performed primarily after data is published. Incorporating its results, by applying corresponding adjustments to the dataset, happens manually and occurs rarely. In the case of (semi-)structured data (e.g., csv, xml), the root of the violations often derives from the mappings that specify how the rdf dataset will be generated. Thus, we suggest shifting the quality assessment from the rdf dataset to the mapping definitions that generate it. The proposed test-driven approach for assessing mappings relies on rdfunit test cases applied over mappings specified with rml. Our evaluation is applied to different cases, e.g., dbpedia, and indicates that the overall quality of an rdf dataset is quickly and significantly improved.
منابع مشابه
Assessing and Refining Mappings to RDF to Improve Dataset Quality
rdf dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually –but rarely– applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the rdf dataset will be genera...
متن کاملDBpedia Mappings Quality Assessment
The root of schema violations for rdf data generated from (semi-)structured data, often derives from mappings, which are repeatedly applied and specify how an rdf dataset is generated. The dbpedia dataset, which derives from Wikipedia infoboxes, is no exception. To mitigate the violations, we proposed in previous work to validate the mappings which generate the data, instead of validating the g...
متن کاملA PCA/ICA based Fetal ECG Extraction from Mother Abdominal Recordings by Means of a Novel Data-driven Approach to Fetal ECG Quality Assessment
Background: Fetal electrocardiography is a developing field that provides valuable information on the fetal health during pregnancy. By early diagnosis and treatment of fetal heart problems, more survival chance is given to the infant.Objective: Here, we extract fetal ECG from maternal abdominal recordings and detect R-peaks in order to recognize fetal heart rate. On the next step, we find a be...
متن کاملGENETIC PROGRAMMING AND MULTIVARIATE ADAPTIVE REGRESION SPLINES FOR PRIDICTION OF BRIDGE RISKS AND COMPARISION OF PERFORMANCES
In this paper, two different data driven models, genetic programming (GP) and multivariate adoptive regression splines (MARS), have been adopted to create the models for prediction of bridge risk score. Input parameters of bridge risks consists of safe risk rating (SRR), functional risk rating (FRR), sustainability risk rating (SUR), environmental risk rating (ERR) and target output. The total ...
متن کاملReal-time quality monitoring in debutanizer column with regression tree and ANFIS
A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the outp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015